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Abstract 

We construct a statistical ensemble of games, where in each inde- 
pendent subensemble we have two players playing the same game. We 
derive the mean payoffs per move of the representative players of the 
game, and we evaluate all the deterministic policies with finite mem- 
ory. In particular, we show that if one of the players has a generalized 
tit-for-tat policy, the mean payoff per move of both players is the same, 
forcing the equalization of the mean payoffs per move of both players. 
In the case of symmetric, non-cooperative and dilemmatic games, we 
show that generalized tit-for-tat policies together with the condition 
of not being the first to defect, leads to the highest mean payoffs per 
move for the players. 



1 Introduction 

Game theory has been formalized by Neumann and Morgenstern in 1944, [1]. 
Their objective was to introduce into the language of economic theory some 
mathematical tools for the quantitative analysis of the behaviour of economic 
agents without a central authority. One of the Neumann and Morgenstern 
arguments in favour of the usefulness of a theory of games is based on the 
intrisic limited knowledge about the facts which economists deal with. This 
argument has also been applied to the description of some physical systems 
and to evolutionary theories in biology. 

In the context of evolutionary biology and in order to analyse the logic 
of animal conflict, Maynard-Smith [2] introduced a game theory approach to 
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describe some of the evolutionary features of organisms. In the framework 
of sociology, Axelrod [3] gave several examples where the game theoretical 
framework is useful. In economics, there is today a vast literature on the 
applicability of the game theoretical approach to economic decision [4], [5] 
and [6]. More recently, the same type of formalism has been applied to 
quantum mechanics [7]. 

In a game, a policy is a rule of decision for each player, and policies can 
be deterministic, depending of the previous choices of one or of both players, 
or can be stochastic. In a two-player non-cooperative game with a finite 
number of choices or pure strategies, both players know the payoffs, make 
their choices independently of each other, and know the past history of their 
choices. It is also assumed that each player maximizes its payoff after a finite 
or an infinite number of choices or moves. 

An important problem is game theory is to determine which policies per- 
form better than others. In this context, Axelrod, [3], proposed the following 
problem: " Under what conditions will cooperation emerge in a world of ego- 
ists without central authority?" . To help to answer this question a computer 
tournament has been settled to decide which policy would perform better 
in an iterated Prisoner's Dilemma game, introduced by Dresher, Flood and 
Tucker [8]. The tournament has been won by the tit-for-tat (TFT) policy, 
submitted by Rapoport, [3, pp. 31]. The TFT policy consists in a simple 
rule that says that one's actual move is equal to what the other player did 
in the previous move. 

In fact, several approaches have been developed in order to decide which 
policies perform better than others in infinitely iterated games. One of these 
approaches relies on the concept of mixed strategy. In a game with several 
possible choices or moves, a player has a mixed strategy if he has a probability 
profile associated to all the possible moves of the game. Based on the concept 
of mixed strategy, the replicator dynamics approach, [9] and [10], postulates 
an evolution equation for the probability profiles of each player's move. This 
evolution equation implies a precise type of rationality of the players, and 
the mixed strategy concept has a subjacent infinite memory associated to 
the choices of the players. 

The formal construction of game theory depends on the relation between 
players and from whom they receive their payoffs. For example, we can 
formalize a two-player game in such a way that the payoffs won by one player 
are the losses of the other, [11]. Another approach is to consider that the 
player's payoffs are obtained from external sources. Our construction applies 
to the second case and applies to games describing the global behaviour of 
systems from economy, sociology and evolutionary biology. 

The aim of this paper is to derive the mean payoffs of the 'representative 
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players' of a game, and to formulate the problem of deciding which policy 
performs better than another in iterated non-cooperative games. 

This paper is organized as follows. In Section |2j we introduce some of the 
definitions that will be used along this paper, and we analyse and interpret 
iterated non-cooperative games from the point of view of dynamical system 
theory. In order to evaluate games and deterministic strategies with finite 
memory, we take the point of view of uniform ensembles of statistical me- 
chanics and we introduce the concept of representative ensemble of a game. 
In this context, the players of the infinite set of games are substituted by the 
'representative agents' of the game. 

In Section EJ we consider the case of a uniform ensemble of games, where 
in each subensemble we have two players playing the same game. The mean 
value of the payoffs per move taken over the uniform ensemble is calculated, 
and gives information about the performance of a game. In Section^ we eval- 
uate the performance of deterministic strategies with finite memory length. 
In the case where in each subensemble a player has a deterministic strategy 
and the other makes his choices with equal probabilities, we calculate the 
ensemble averages of the payoffs per move. The main results of sections 
and |U are summarized in Theorems 13.11 and 14.11 an d Corollary 14.21 In par- 
ticular, we show that if one of the players has a generalized tit-for-tat policy, 
the mean payoff per move of both players is the same. Therefore, generalized 
tit-for-tat is the best policy against exploitation. 

In Section we consider the case where the opponent players have de- 
terministic policies, within the same memory class. In this case, the game 
dynamics is a deterministic process, and the mean payoffs per move depend 
on the initial moves of both players and on the policy functions of both play- 
ers. Comparing all the possible deterministic strategies with memory length 
1, we prove that, in dilemmatic games, the generalized tit-for-tat policy to- 
gether with the condition of not being the first to defect, leads to the highest 
possible mean payoffs per move for the players. 

In Section |H1 we apply the formalism developed in this paper to the Pris- 
oner's Dilemma and to the Hawk-Dove games, and we analyse their state 
space structure. Finally, in Section [?l we summarize the main conclusions of 
the paper. 

2 Formalism and definitions 

We take a two-player game with two possible choices or moves — two pure 
strategies. At times n > 1, each player chooses, independently of the other, 
one of the two possible pure strategies. These pure strategies are represented 
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by the symbols '0' and '1'. We denote by S = {0, 1} the set of pure strategies, 
and by P and Q the two players. After a move, each player owns a profit or 
payoff that is dependent of the opponent move. The payoff matrices of the 
game are: 

Aoo Aqi \ / Boo Bqi 



\ Aio A n ) ' V Bio Bn 

where the payoff of P is A^ if player P plays % and Q plays j. In the same 
move, Q has payoff Bij. If each player makes its choice independently of 
the other, we are in the context of non-cooperative games. If B — A T , the 
two-player game is symmetric. In the following, we analyze only the case of 
symmetric and non-cooperative games. 

In a two-player symmetric game, we say that a pure strategy i G S is 
dominant, [12], if A^ > Akj, for every j = 0, 1, and A^ > A^ for some j and 
k 7^ %. For example, the symmetric and non-cooperative games with payoff 
matrices, 

f 3 "\ A ( 1 
Al = U 1 ) ' ^5 3 

have '1' as dominant strategy (Aio > ^oo and An > An). In the first game, 
if the two players choose both the dominant strategy '1', their payoffs is 1. In 
the second game, the payoff of each players is 3, and the dominant strategy is 
the right choice for both players. However, as A o > An for the first game, if 
both players choose the non-dominant strategy, their individual payoffs per 
move is higher when compared with the choice of the dominant strategy by 
both players. 

These two examples suggest the following definition: A symmetric two- 
player game is dilemmatic, if either, 

Aio > A 00 > A n > A 01 and 2A 00 > A 01 + A 10 (2.1) 

or, 

A i > A n > A 00 > A w and 2A n > A w + A 01 (2.2) 

where the second inequalities in (2.1) and (2.2) have been introduced in order 
to favour the non-dominant strategy. 

In the first case of a dilemmatic game, (2.1), the strategy '1' is dominant. 
In the second case, (2.2), '0' is the dominant strategy. If both players choose 
the dominant strategy in one move, they get smaller payoffs than the ones 
they could have obtained if both had chosen the non-dominant strategy. 

In an iterated game with a fixed payoff matrix, players are always play- 
ing the same game, and their payoffs accumulate. Therefore, a two-player 



4 



iterated game is described by the two sequences of pure strategies of each 
player, 

11 = (fj,i, fj, 2 , ...,//„,...) ^2 o\ 

a = (ai,a 2 , . . . , a n , . . .) 

where \i n and a n represent the choices of the players P and Q, respectively, 
at discrete time n > 1, and Li n ,cr n G S. The sequences (2.3), completely 
specify the accumulated payoffs of both players. We call fi and a the game 
record sequences. In an infinitely iterated game, the accumulated payoff of 
the players can be infinite. The mean payoffs per move are always finite and, 
for a symmetric game, they are given by, 

G p = lim^oo - X)i=i A*»<7« (o a\ 

n - Hm iV" A ^ > 

where G v and G q are the mean payoffs of players P and Q, respectively. 

An example of a symmetric, non-cooperative, and dilemmatic game is 
the Prisoner's Dilemma game. In this game, we have two players with two 
possible pure strategies, '0' and '1', and we have chosen the payoff matrix, 

A Is) < 2 ' 5 > 

As, A\q > Aqo > An > Aqi and 2Aoo > A]i + Aiq, the Prisoner's Dilemma 
game is dilemmatic with '1' as dominant pure strategy. The pure strategy 
'0' corresponds to cooperation and the pure strategy T' to defection. For 
a discussion about the importance of dilemmatic games and the Prisoner's 
Dilemma game, see the discussion in Axelrod, [3]. 

Following Neumann and Morgenstern [1], a strategy or policy is a set of 
rules that tells each participant how to behave in every situation which may 
arise. The only sources of information available to the players is the set of all 
possible moves, their possible payoffs, and the history of the previous moves 
of both players. To describe a rule of decision, policy, or strategy we can 
adopt the Neumann-Morgenstern view where a rule of decision is specified 
through the knowledge of a function of the m previous moves. 

Deterministic strategy: In an iterated two-player game, with game 
records ii and a for players P and Q, respectively, a rule of decision or a 
deterministic strategy with memory length m > 1 for player P is a function 
/ : S m -> S such that, 

Lii = /((7j_ m , ...,<7i_i) (2.6) 

for every i > m. Analogously, player Q has a deterministic strategy with 
memory length n > 1, if there exists a function g : S n — > S such that, 

Ci = g(fJ>i-n, ...,fJ>i-i) 
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for every i > n. 

In the following, deterministic policy and deterministic strategy have the 
same meaning. In some game theory texts, the word 'strategy' refers to 'pure 
strategy', an element of the set S = {0, 1}, and in other contexts if refers to 
policies, as in 'tit-for-tat strategy'. 

By definition, the outcome of a player's choice or move at time % > m + 1 
is determined by a finite number of previous moves of the other. In general, 
we can take the functions / : S m+ra — > S and g : S r+S — > S, and set, 

l^i f\&i—mi •••) "i— lj l^i—ni •••i fJ>i—l) 
a i = Qyl^i-r, ■■■,fJ'i-l, &i-s, O'i-l) 

In the following we will only analyze the case where each player's choice 
depends on a finite number of previous moves of the other, and to = n. 

For example, adopting the definition of the tit-for-tat (TFT) strategy 
given in the introduction, a TFT strategy with memory length to — 1 is 
described by the boolean identity function / : S — > S, defined by, 

/(0) = and /(l) = 1 

Generalized tit-for-tat strategy (GTFT): We say that / : S m -> S is 
a generalized tit-for-tat strategy with memory length to, if the number of '0' 
and '1' in /(a^), when runs over the set S m , are equal. More formally, 
/ : S m — > S is a generalized tit-for-tat strategy with memory length to, if, 

#{a (m) G S m : f(a {m) ) = 0} = #{<7 (m) G S m : f(a (m) ) = 1} 

where = (o"i, . . . , a m ), and Oi G S. 

To solve a game it is meant to find a procedure to determine for each 
player's choice which is the most favourable result ([1], [11] and [13]). In this 
context, the concepts of mixed strategy and equilibrium state of a game are 
fundamental tools in game theory. 

A mixed strategy is a collection of probabilities associated to each player 
and its pure strategies. The players P and Q have mixed strategies s p = 
(so p , sip) and s q = (so q , si q ), if each player plays pure strategy i with proba- 
bility Sj*. Obviously, so* + si* = 1. 

In a symmetric game with two pure strategies and mixed strategies s p and 
s q for players P and Q, respectively, the mean payoffs per move of players P 
and Q are, 

P : E(s p \s q ) = SQ p (SQ q A 00 + SiqA i) + Sip(s 0q A 10 + SiqAu) 

— SopSoqAoQ + S 0p S lq A i + S lp S 0q A w + SipSigAn , , 

Q : E(s q \sp) = s 0q (s 0p A 00 + s lp A 01 ) + s lq (s 0p A w + s lp A n ) 

— So^-SOp^OO + So q S\ p Aoi + SiqSo p AiQ + SiqSipAn 
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The time evolution of a game with mixed strategies s p and s q can be 
seen as a stochastic processes with two independent random variables X and 
Y. The random variables X and Y, associated to players P and Q, respec- 
tively, have mean values given by (2.7). More precisely, X can assume the 
values Ao , A i, A w , An with probabilities so p so q , so p si q , si p so q and si p s\ q , 
respectively. Analogously, Y takes values in the same set, with probabilities: 
s oq s op, SQgSip, Si q s 0p and Si q Si p . Therefore, the deviations from the mean 
payoffs per move of the players, or the fluctuations from the mean values, 
are characterised by the variances, 

a p( s p\ s q) = s op s O(j(A)o — E(s p \s q )) 2 + So p Si q (A i — E(s p \s q )) 2 

+Si p So q (Aio — E(s p \s q )) 2 + Si p Si q (An — E(s p \s q )) 2 . gx 

a q( S q\ S p) = So q S 0p (A 00 - E(s q \s p )) 2 + S 0q S lp (A 01 - E(s q \s p )) 2 

+s lq s 0p (A 10 - E(s q \s p )) 2 + s lq s lp (A u - E(s q \s p )) 2 
In general, E(s p \s q ) ^ E(s q \s p ), but, by a straightforward calculation, a 2 (s p \s q ) 

Imposing the condition E(s p \s q ) = E(s q \s p ) in (2.7), a game or a mixed 
strategy is equalitarian, if either, A i = A w , or s p = s q . 

To characterise the dynamics of an iterated symmetric game, we introduce 
the concept of phase or state space of a game. The state space of a two-player 
game is the convex closure of the points (A 00 , A 00 ), (A u , A u ), (A i, A 10 ) and 
(^4io,v4 i), in the two-dimensional space of the payoffs of players P and Q. 
Let us denote by K, the state space of a game. As so p , so q G [0,1], then 
(E(s p \s q ), E(s q \s p )) G /C. In Fig. HI we show the state space /C for the 
Prisoner's Dilemma game with payoff matrix (2.5). 

In an iterated game, the mean initial (at time n = 1) payoffs per move of 
both players is (xi,yi) = A aifll ) G /C. By (2.4) and after n + 1 moves, 

the mean payoffs per move of both players is, 



/ \ / 1 v-^ri+l a 1 v^ n +l A 



^ n+ i x n + n+1 A^ n+ian+1 , n+l y n + n+1 A an+llJi7i+1 ) G fC 



(2.9) 



and the iterated two-player game is dynamically described by a (non-deterministic) 
one-to-many map, /3 : K — > fC, [14]. The equilibrium point or equilibrium 
solution of a game is the point lim n ^ 00 (x„, y n ). 

For a given mixed strategy profile s p and s q of players P and Q, the ite- 
rated two-player game has the equilibrium point, (E(s p \s q ), E(s q \s p )) G /C. 
As 

s op ; s oq £ [0, 1], the set of equilibrium states of the map (3 : K — > JC span 
all the state space of a game. 

For example, in a two-player game with the mixed strategy profiles s p = 
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Figure 1: State space /C of the Prisoner's Dilemma game with payoff matrix 
(2.5), where x n and y n are the mean payoffs per move of players P and Q, 
as calculated from (2.9). The dots represent the mean payoff per move of an 
instance of the iterated game with both players choosing strategies '0' and 
'1' with equal probabilities (s p = s q = 1/2). By (2.7), the equilibrium of the 
game is the point (0, 0), and the standard deviations of the mean payoffs of 
the players are a p = o q = \^59 = 7.68, calculated from (2.8). 

(1/2, 1/2) and s q = (1/2, 1/2), by (2.7), the equilibrium point of the game is, 

(^(A)o + An + A w + A u ), ^(A)o + An + Ao + A n )) (2.10) 

By (2.8), the fluctuations around the equilibrium are, 

a p 2 ((l/2, l/2)|(l/2, 1/2)) = a 2 ((l/2, l/2)|(l/2, 1/2)) 

= ^(3^ + 3^ + 3^ + 3^ (2.11) 

-2A 00 (A 01 + A w + An) - 2A U (A 01 + A w ) - 2A 01 A W ) 

In Fig. we represent several iterates of the map f3 for the Prisoner's 
Dilemma game with payoff matrix (2.5), and mixed strategies s p = s p = 
(1/2,1/2). In the limit n —>■ oo, (x n ,y n ) —>■ (0,0). The fluctuations from 
equilibrium have standard deviations a p = a q = \^59 = 7.68. 

A mixed strategy is a strict Nash equilibrium solution of a game if P and 
Q maximizes their payoffs per move independently of each other. A mixed 
strategy is a Nash bargain equilibrium solution of a game if E(s p \s p ) is a 
maximum. 

In the case of the of the Prisoner's Dilemma game with payoff matrix 
(2.5), by (2.7), we have, 

P : E(s p \s q ) = —5 — 4sq p + 16so q — 4:SQ P So q (2 12) 

Q : E(s q \s p ) = -5 - 4s 0q + lQs 0p - 4s 0p s 0q 
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Maximizing E(s p \s q ) in order to s 0p and E(s q \s p ) in order to s 0q , the strict 
Nash equilibrium of the game is obtained when both player choose the mixed 
strategy (sq p , si p ) = (sq p ,sx p ) = (0,1). In this case, the Nash equilibrium 
state of the map j3 : /C — > /C is the point (—5, —5) G /C. The Nash bargain 
solution of the game is obtained from (2.12) with s p = s q , and is the point 
(3, 3) G /C, Fig. [T] As both Nash solutions correspond to the choices of pure 
strategies with probability 1, by (2.8), the fluctuations of the iterated game 
have zero standard deviations. In general, a n-person non-cooperative game 
has always a strict Nash equilibrium, [13]. 

The choice of a mixed strategy profile for a game has the advantage that 
the iterates of the map /3 : K. — > K converges to the equilibrium solution 
E(s p \s q ). However, the choice of a mixed strategy profile implies that both 
players have infinite memory, which, in real situations, is difficult or even 
impossible to fulfil. 

On the other hand, in some game theory approaches describing the global 
behaviour of economic, social and evolutionary systems, there are a large 
number of agents or players in mutual interaction. These individual agents 
interact with the same rules and can also change partners along time. These 
situations are difficult to interpret under the infinite memory hypothesis, 
implicitly associated to the concept of mixed strategies. 

Following this point of view, to evaluate a non-cooperative and sym- 
metric game and their possible deterministic strategies (short memory), we 
adopt the point of view of the statistical ensembles of statistical mechanics. 
We suppose first that we have an infinite system composed by independent 
subensembles, where in each subensemble we have two players playing the 
same game with payoff matrix A. We call this ensemble of independent 
games the uniform ensemble ([15, pp. 56]) of the game. This uniform en- 
semble is characterized by the payoff matrix A, and the players P and Q are 
the representative agents of the ensemble of the game. 

In each subensemble, a game with payoff matrix A is played, and subensem- 
bles are characterized by the mean payoffs per move of both players. The 
global properties of the game will be described by the mean payoffs per move 
averaged over all the subensembles. We say that the representative players 
P and Q of the game have mean payoffs per move G p and G q , respectively, 
where the average is taken over all the subensembles. 

To evaluate a game, we first consider that each player chooses its pure 
strategies with equal probabilities, and each subensemble is characterized by 
the two sequences of pure strategies /i and a. The properties of the game are 
determined by G p and G q . 

To evaluate a deterministic strategy or policy, we consider that in each 
subensemble game, P plays with the deterministic policy /, and Q has a 
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game record a G S . Defining an ensemble probability density function p q 
for the occurrence of game record a for player Q, the ensemble of games 
will by characterized by the mean payoff per move of both players averaged 
over the set of all allowed sequences a G with probability measure p q . 
These averages depend on / and p q , and we can compare the performance of 
a policy with the case where the players have no policies. Within the same 
memory class, we use these ensemble averages to compare the mean payoffs 
for different policies. 

When both players have a deterministic policy, the mean payoffs per move 
of the players depend on the finite number of initial conditions of the game. 



3 The uniform ensemble of a game 

We consider an ensemble of subsystems, where in each subsystem there are 
two players playing the same game. We denote the game record of players P 
and Q by p and a, respectively. As p and a are infinite sequences of '0' and 
T', we can identify p and a as real numbers in the interval [0, 1] through, 

oo oo 
i=l i=l 

Relations (3.1) define a map : S N -> [0,1]. The map is an isomorphism, 
except when (<7i, 02, ...) or (pi,p2,...) represents dyadic rational numbers, 
[16]. As the set of dyadic rationals has zero Lebesgue measure, the infinite 
sequence of moves of both players can be represented, almost everywhere, by 
two real numbers x, y G [0, 1]. Therefore, the interval [0, 1] is naturally the 
space of game records. 

Making this identification between game records and real numbers, in 
each subensemble game, by (2.4), the mean payoffs per move of the players 
are, 

G p (p, a) = lim^oo \ £" = i A W<T . := G p (x, y) _ _ 

G q (p, a) = lim^oo \ YTi=\ A *m G q( x , v) 
where 1,1/6 [0, 1]. 

As each subensemble game is independent of the other, and each player's 
move is independent of the history of the game, we can assign ensemble 
probability density functions to the game records. Let p p (x) and p q (y) be the 
ensemble probability density functions of game records of the representative 
players P and Q, respectively. For example, p p (x)dx is the probability of 
finding a subensemble with player P with a game record in an interval of 
length dx centred around x. 
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Assuming further that all the game records are equally probable, p p (x) = 
1 and p q (y) = 1, the ensemble averages of the mean payoffs per move are, 

g_p = Jo* Jo 1 g p( x ' y)pp( x )pg(y) dx d y = Jo* Jo 1 g p( x ^ y) dx d y ( 3 3 ) 
g q = Jo Jo G i( x > y)p P ( x )pq(y) dx d y = Jo Jo 1 G i( x ^ y) dx d v 

To characterize the statistical ensemble of a non-cooperative and sym- 
metric game with payoff matrix A, we now calculate the integrals in (3.3). 
We consider the sequences of functions, 

As n — > oo, Gp(/jL,a) — > G p (fi,cx), and G^(fi,a) — > G q (/J,a). In the sense of 
Lebesgue integration, the integrals in (3.3) can be calculated as the limits of 
the integrals of the functions G r p l (fi, a) and G n (/i, a). 

Let us first take n — 1. By (3.2), (3.3) and (3.4), we have, 

G l = Jq l Jq l Ap^dx dy ^ ^ 

^<z = Jo Jo A aitll dxdy 

where /xi and <7i are the first digits in the binary developments of x and 
y, both in the interval [0,1]. Therefore, the functions A^ iai = y) 
and A CTljU1 = (x, y) are piecewise constant in the unit square, and the 
integrals in (3.5) are straightforwardly evaluated to, 



G l = Jo Jo A ^i dx d V = 2^(^00 + Aqi + A 10 + A n ) 
G\ = f J A aifll dxdy = ^(A 00 + A 01 + A w + A u ) 



(3.6) 



Note that, the functions A /Jiiai (x,y) and A aifll (x,y) are piecewise constant 
functions from [0, 1] x [0, 1] to the set {A 00 , A 01 , A 10 , A n }. 
In general, by (3.4) and (3.3), 



rvn+l _ _n_nn A 1_ f 1 f 1 A J J 

_P ~ 71+1 _P ^ 71+1 J0 i ^Mn + KTn + l"" 6 "f/ ^ 

— T^+I G q + TT+T Jo Jo Aa n+ll i n+1 dx dy 

The functions A Mn(7n = A Mn(7n (x, y) and A CTnMn = ^^(x, y) are piecewise 
constant and assume the constant values A 00 , Aqi, A w and An in squares of 
side 1/2™. As, for each pair of indices (a n , /i n ), the domain where A Unfln (x, y) 
is constant is composed by 2 2<n_1 ) disjoint squares, we have, 

£ Jo 1 A ^n dx d y = Jo Jo K nlln dx dy 

= 2 ^(A 00 + A 01 + A 10 + A rL ) (3-8) 
= 2^(^00 + A 01 + A 1Q + A n ) 
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Introducing (3.8) into (3.7), by induction, and taking the limit n — ► oo, we 
obtain the values of the ensemble average of the mean payoffs per move of 
each player: 

Theorem 3.1. We consider an ensemble of non-cooperative and symmet- 
ric two-player game, where in each subensemble we have two players making 
their choices with equal probabilities. Assume that each player's move is in- 
dependent of the history of the game and that the ensemble probability density 
functions of each representative player are uniform in the interval [0, 1] of the 
game records. Then, the mean payoffs per move of the representative players 
of the game are equal and are given by, 



where the are the entries of the payoff matrix. 

In the uniform statistical ensemble of a non-cooperative and symmetric 
game with all the players choosing their pure strategies with equal probabil- 
ities, the average payoff per move is equal to the average value of the entries 
of the payoff matrix A. 

These elementary results can be straightforwardly generalized to non- 
cooperative and non-symmetric n-player games. 

4 Evaluating deterministic strategies 

To evaluate the performance of a deterministic strategy in an iterated game, 
we first enumerate the class of all the boolean functions / : S m — > S, where 

5 = {0,1}. These boolean functions describe all the possible deterministic 
strategies. 

For each class of functions with memory length m, there are exactly 2 2 ™ 
different functions. To enumerate a deterministic policy function within a 
memory class m, f(cr^ m '), where = (<Ti, . . . , a m ) E {0, l} m , we introduce 
an additional index n. Within each memory class m, each possible policy 
function will be denoted by fm,n, where n = X^=o _1 ,/m,n( cr i"^)2 4 is the policy 



number, a)^{ = 07 + (0, 0, . . . , 1), (Tq = (0, 0, . . . , 0), and the "plus" sym- 



bols must be understood in the sense of binary arithmetic. For example, in 
Table ^ we show all the possible deterministic policy functions with memory 
length m = 1. 

In this case, the deterministic TFT policy corresponds to the boolean 
function f\^. The functions f\^ and f\ t \ are GTFT policies with memory 
length m = 1. 




1 



12 





/l,n(0) 


/l,n(l) 


/l,0 










1 





/l,2 





1 


/l,3 


1 


1 



Table 1: Deterministic policy functions for m — 1. 

Suppose now that the representative player P of a game has a policy / mi „ 
and the opponent player Q can have any game sequence a = (o"i, o"2, ...). 
Then, by (2.4), for the infinitely iterated game, the mean payoff per move 
for each player is, 

G p (a) = liniM^oo -j Ya=i Aw 

= limjVf^oo M _ m X/j=m+l ^/m,n(o-i_ m ,...,<Ti_l)<Ti (^-l) 
M — m £—'1=171+1 °"i/m,n(o"i-mvi< T i-l) 



G g (a) = liniM^oo j£= E 4 = m +i A 



and G p and G g are functions of o = (o"i, a"2, ...) and / m)n . In the first m 
iterations of the game, the accumulated mean payoffs depend on the initial 
choices of the players. However, in the limit M — > oo, the dependence on 
the initial choices vanishes. 

Let us take the infinite sequence (a±, 02, ...) G S 1 ^ characterizing one of 
the possible outcomes of the choices of the player Q, and define the real 
number, 



00 

a. 



y = Z4 < 42 > 



2 

i=i 

With this identification between infinite sequences of zeros and ones with 
real numbers in the interval [0, 1], we write the mean payoffs as, 

Gp\0') = Gp(y, fm,n) ■ Pm,n\y) /a n\ 

G q (a) = G q (y; f m>n ) := Q m>n (y) 

Let us suppose now that we are in framework of statistical mechanics and 
we have an ensemble or collectivity of players P and Q. In each subensemble 
of the collectivity, the player P plays according to strategy f m n and Q has 
some game record y G [0, 1]. Suppose additionally that all the subensembles 
of the collectivity are independent. 

As each member of the collectivity is independent of the others, we can 
assign an ensemble density function p q {y) to the collectivity. The function 
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p g (y) is the probability density of the game record y of player Q. If p q (y) — 1, 
all the game records of Q are equally probable. The uniform ensemble of the 
game can then be characterized by the ensemble mean payoffs per move and 
per player, 

P m , n = Jo P m , n (y)p q (y)dy = J Q l P m , n {y)dy 
Q m ,n = J Qm,n{y)p q {y)dy = J Q m ,n{y)dy 

In Fig. we show the mean payoff functions P m , n (y) and Q m , n {y) f° r 
the TFT policy / 12 and payoff matrix (2.5) of the Prisoner's Dilemma game. 
These functions have been calculated numerically from (4.1), (4.2) and (4.3). 



(4.4) 





Figure 2: Mean payoffs per move P\^{y) and Qi^{y) as a function of the 
game record y G [0, 1] of the player Q, in the Prisoner's Dilemma game with 
payoff matrix (2.5). Player P has TFT policy fi^- 

To calculate the mean payoffs P m , n and Q m ,n given by (4.4), we first 
approximate the functions P m ,n{y) and Q m ,n(y) by sequences of piecewise 
constant functions. By (4.1)-(4.3), we define the sequences of functions, 

{ P ™,n(y)} M>m+1 and {Qm,n(y)}M>m+l, as, 

^M A 

i(<3'i-m,---,0'i-l)°'i 
i O"i/m,n(o-i_ m ,...,(Ti_l) 



pM / \ 
m,n\i) I 



1 



r, 



M—m Z-n=m+l "A/n 
^m,n\y) M-m — 1 1 " 



(4.5) 



-m+l 



where (<Ti,o" 2 , ...) is the binary development of y. For M > m + 1, Pj^ n (y) 
and Qmniu) are piecewise constant functions in the interval [0,1], and, in 
the limit M — > oo, they converge almost everywhere to P m ,n(y) an Q m>n (y), 
Fig. El In the sense of Lebesgue integrations, this implies that, 

lim^oo Jo P™ n (y)dy = P m , n 
liniM^oo Jl Q™ n (y)dy = Q m , n 

Let us now calculate the integrals in (4.6). For M = m + 1, by (4.5), we 
have, 

P™+\y) =Aj 
= A n 



(4.6) 



QZtiv) 



-\fm,,n(p\, 







(4.7) 
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As (<7i, . . . , cr m+ i) represents the first m + 1 terms of the binary development 
of y, the functions in (4.7) assume constant values in subintervals of [0, 1] of 
length l/2 m+1 . In each of these subintervals, P™+ l (y) and Q^niv) assume 
one of the four values: Aqq, Aqi, A\q and An. 

Associated to each deterministic policy function f m ,n, we define the num- 
bers, 

n rn,n = #{(j(m ) £ ^ 1}m . J^^H) = } 

n™' B = #{<7<™> G {0, l} m : fmAv (m) ) = 1} 

and, r%' n + n?' n = 2 m . 

Under these conditions, by (4.7) and (4.8), we have, 



(4.8) 



(4.10) 



= ^tt W n (A o + 4,i) + n?> n (A 10 '+ A 11 )) (4-9) 
Jo QZXWy = vkr K l ' n (A 00 + A w ) + n?' n (A 01 + A n )) 

For M > m + 1, by (4.5), we have, 

pM+l( \ = M-m pM ( \ i 1 4 

nM+1 /. .\ _ M-m /. \ j 1 4 

Vm,n Vi// M—m+l^'rn,n\y) ' M+l—m °~M+ifm,n(o'M+l-m,---,o'M) 

and, as in (4.9), 

+ 2 m+ i(M+i-m) W"(A)o + An) + <' n (Ao + A n )) 

+ 2m+1 (M + i-m) W"(A)o + Ao) + <' n (Ai + An)) 

(4.11) 

By (4.9) and by induction from (4.11), we obtain, 

s l G p™:\y)dy= ^(nr^oo+ioO+nr^o+An)) , 412 , 

£Q%l 1 (v)dy= ¥ kr(n^ n (A 00 + A 10 )+nT' n (A 01 + A 11 )) l ' J 

As the integrals in (4.12) are independent of M, by (4.6), we have proved: 

Theorem 4.1. VFe consider an ensemble of non-cooperative and symmetric 
two-player games, where in each subensemble we have a player P playing 
with deterministic policy f m ,n, an d a player Q making the choices of pure 
strategies with equal probabilities. Then, the mean payoffs per move of the 
representative players of the game depend on the payoff matrix A and on the 
strategy f m ,n, o,nd the mean payoffs per move are, 

P_m,n = (<T(A)o + An) + < n ( A io + ^n)) 

Qm,n 2 rn + 1 

(n™'"(Ao + Ao) + n™ ,n (A 01 + An)) 
15 



where the Ay are the entries of the payoff matrix, n™' n = ^{a^ G {0, l} m : 
fm,n(° (m) ) = 0}> and = e {0, l} m : / m , n (a^) = 1}. 

This theorem has a direct consequence. With the definitions of Section 
El a policy or strategy f m>n is equalitarian if the mean payoffs of the repre- 
sentative players are equal. Imposing the equality between P m ri and Q m) n in 
Theorem 14. 1\ we obtain, 

rC n (Ai - A w ) + nT' n (A 10 - A m ) = (4.13) 

From (4.13) it follows that a policy is equalitarian if either n n,n = n™' 71 
or, Aqi = A 10 . In the first case, we have the class of all GTFT policies, 
independently of the values of the entries of the payoff matrix A. 
If Aqi = Aio, it follows from Theorem 14. II and (4.13), that, 

771,71 -i -i 

Pm,n = Qm,n = ^(^00 ~ A U ) + ~A U + -A 01 (4.14) 

where we have introduced the relation n™' n = 2 m — n n ' n . Therefore, we have: 

Corollary 4.2. We consider an ensemble of non-cooperative and symmetric 
two-player games with payoff matrix A, where in each subensemble we have 
a player P playing strategy f m ,n, and a player Q making the choices of pure 
strategies with equal probabilities. Then the policy f mjn is equalitarian if 
either, it is GTFT or, Aqi = A w . Moreover, the payoffs per move of GTFT 
policies are given by, 

Pm,n — Qm,n = ^ (A)0 + Aqi + A w + An) 

For example, in games with memory length m = 1, independently of the 
payoff matrix A, the equalitarian strategies are /^i and /i j2 , both GTFT. 
From the point of view of the ensemble mean payoffs per move, all the GTFT 
strategies are equivalent to ensemble games where all player play randomly 
with equal probability. 

We determine now the best policy for a player P with an opponent Q 
choosing pure strategies with equal probabilities. By Theorem 14.11 and with 

m.Tl D771 771,77 1 , • 

n 1 = 2 — Uq , we obtain, 

771,77 -I 

P m ,n = ^T[(Aqo + A 01 - A n - Aw) + -(A n + A 10 ) (4.15) 

Therefore, in the sense of ensemble average and for a given memory length 
m, the best policies for the player P are the ones that maximise (4.15), for 
all the choices of the integers n^' n = 0, . . . , 2 m . 
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5 Both players have deterministic strategies 



When the two representative players P and Q have deterministic strategies 
within the same memory class, their game records become dependent of the 
first m moves of the players. As we have two players and 2 m different initial 
conditions for each player, for each choice of a pair of deterministic strate- 
gies, there are at most 2 m+1 different payoffs per move for both players. As 
there are 2 2 " 1 different boolean functions of memory length m, the maximum 
number of equilibrium states is, 2 2 ™ +1 x 2 m+1 , which, for m = 1, is 64. 

Let us analyse now in detail the case of memory length m = 1. If \i\ and 
a i represent the choices for the first move of players P and Q, and P and Q 
have policies / = /i >r and g = /i jS , respectively, their game records are, 

P : (/ii, /(oi), / • g(ni), f ■ g- /(o-i), . . .) 
Q ■ (<ti,9(hi),9 • f(<ri),g ■ f ■ gM, ■ ■ ■) 

where / • g(pi) = f(g(pi))- After a few moves, the game records become 
periodic. Therefore, the mean payoff per move of each player can be calcu- 
lated by the periodic sequences which depend on the initial moves and on the 
policies. For example, with / = f 12 and g = /i 2 , and initial moves \i\ = 
and <j\ = 1, we obtain the game records, 

P: (0,1,0,1,...) 
Q: (1,0,1,0,...) 

and the mean payoff per move of both players is P = Q = (Aqi + Aiq)/2. 
But for the initial moves H\ = and o\ = 0, we have, P = Q = A 00 . 

In Table |2J we show the mean payoffs per move and per player, for all 
the deterministic policies with memory length m — 1 and all the possible 
four different initial moves of the players. Counting the different values in 
the entries in table, we conclude that, for m = 1, the number of equilibrium 
states is 7. For a given game, the best strategy and initial conditions is 
obtained by analyzing the entries of Table El Clearly, the best strategy 
depends on the entries of the payoff matrix of the game. 

In general, let (pi, . . . , /i m ) and (01, . . . , a m ) be the first m moves of play- 
ers P and Q, respectively. Suppose further that player P and Q choose the 
deterministic strategies f m , n and g m , n , respectively. Iterating the game, after 
some transient iteration the game record sequences become periodic, and the 
mean payoffs per move and per player are easily calculated. If (/ij+i, . . . , /i«+ p ) 
and (o"j + i, . . . , Ui+p), for some % > 1, are the periodic patterns of period p of 
the game record sequences, the mean payoffs per move of the players are, 

P ■ pi^tH+lO'i+l + • • • + A^ i+p(Ti+p ) 
Q ■ p(^cr i+ i^ + i + • • • + A ai+p ^- +p ) 
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P : A,c 



o 


Q ■ /i,o 




Q : 


h.i 


Q ■ h 




Q: 


/l,3 




Aon 


P 


A i 


Q ■ A 10 


Aqq 


P 


A i 


; Q ■ A w 










P- h 


1 








o 


Q ■ h,o 






Q ■ fi,i 


Q 


■■ ha 




Q ■ /l,3 


a p 
o p 


A 1Q ; Q : 
A w ; Q : 


A i 
A i 


P : 


An+A o 
2 

Aqi; Q : 


A w 


B 
4 
B 

4 


P : 
P : 


Aoi; Q : Aiq 
Aqi; Q : Aiq 



(I) 
(I) 



P:A W 
P:A W 



; Q 
; Q 



A i 
Aqi 



P : A w ; Q : A 01 

Ah+Aqq 
2 



B 
4 
B 
4 



P:A 01 ; Q 
P ■ A 01 ; Q 



A w 
A w 



P ■ /l,2 



O Q ■ /i,o Q : /:., Q : ./ 1 .2 Q : /i,3 



(?) 

(1) 



A 00 


i? 

4 


A 00 


An 


A 00 


s 

4 


Aio+A i 
2 


An 


Aoo 


B 

4 


Aio+Aoi 
2 


An 


Aoo 


B 

4 


An 


An 



P : /i, 3 



(^) = A,o 


Q ■ h.i 


Q ■■ /i,2 g : /i,3 


P : Aiq; Q : A i 


P : Aio; Q : A i 


An An 



Table 2: Mean payoffs per move of players P and Q as a function of the initial 
move and policies with memory length m — 1. In the first and forth tables, 
the mean payoffs per move of player Q are independent of the initial move. 
To simplify the notation, we have done B = (A 00 + A 01 + A w + An). When 
an entry shows only one payoff, this payoff is the same for both players. The 
TFT strategy corresponds to the deterministic strategy function / 12 , and 
/i i and fi 2 are GTFT policies. 
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and these mean payoffs are the equilibrium states of the game. For example, 
for m = 2, we have at most 2 22+1 x 2 2+1 = 2048 equilibrium states. 

6 Examples and policy analysis 

The formalism introduced in the previous sections leads to the evaluation of 
policies for an iterated game with a given payoff matrix A. In this context, 
we can forget the role of players P and Q and speak about the performance 
of the game, the performance of a deterministic strategy and the relative 
performance of two deterministic strategies. 

We analyze now two examples, the Prisoner's Dilemma game and the 
Hawk-Dove game. 

6.1 The Prisoner's Dilemma 

In the Prisoner's Dilemma game with payoff matrix (2.5), if all players make 
their choices with equal probabilities, by Theorem 13.11 the mean payoff per 
move and per player is zero. By Corollary 4.2, a player with a GTFT policy 
against a player choosing its pure strategies randomly has also zero payoffs. 
This includes the simplest case of the tit-for-tat policy. 

From the point of view of the non-deterministic map (3 : K, — > )C, the 
situation of Theorem 13 . II corresponds to the equilibrium solution (0,0) G /C, 
Fig. Hi). 

In the case of Theorem 14.11 and for deterministic strategies with mem- 
ory length m = l, there are three equilibrium solutions for the Prisoner's 
Dilemma game. These equilibrium solutions are: (—3, 7) G /C, (0, 0) G JC 
and (3,-7) G /C, Fig. [TJd). So, in a uniform collectivity, the players that 
choose the dominant strategy have a better payoff, provided their partners 
choose their strategies with equal probabilities. 

Suppose now that both representative players P and Q adopt a policy 
with memory length m = 1. Analysing the results of Table El the best payoff 
per move for both players is obtained when player P and Q play tit-for-tat 
and both choose the initial strategy '0'. The best payoff per move is also 
obtained when one at least of the contenders chooses '0' and the other plays 
according the tit-for-tat policy. In the case of policies of memory length 
m = l, the tit-for-tit policy forces cooperation. If one of the player plays 
tit-for-tat and the other player chooses another strategy, tit-for-tit ensures 
that the payoffs per move of both players are equal and the second player 
is not able to increase its payoff per move. If both players choose a tit- 
for-tat policy, depending on the initial condition, we can have four different 
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-10-5 5 10 -10-5 5 10 -10-5 5 10 

x n x n x n 



Figure 3: State space K, and equilibrium solutions of the iterated Prisoner's 
Dilemma game. In a), all the players in the collectivity choose their strate- 
gies with equal probabilities. In b), we have three possible equilibrium states. 
In each subensemble game one player follows a deterministic strategy with 
memory length m — 1, and the other chooses its pure strategies with equal 
probabilities. Depending on the adopted policy, we can have different equi- 
librium states. In c), all the players have chosen a deterministic strategy 
with memory length m = 1, and the game can have seven equilibrium states. 



equilibrium states of the game, Table El and Fig. Efc). In this case, two of 
them are the strict and the bargain Nash solutions. If one of the players 
chooses always the strategy '1', it corresponds to the deterministic policy 
/i 3, and the outcome of the game against a tit-for-tat corresponds to the 
Nash strict equilibrium of the game. The seven equilibrium solutions of the 
Prisoner's game are plotted in Fig. Efc). 

If P has to choose a policy against a player Q that plays its strategies 
with equal probabilities, by (4.15) and (2.5), the best policy for P is the one 
that maximizes, 

Pm,n = ( 3 - 12^-^ J 

Therefore, in the Prisoner's Dilemma game, the best policy corresponds to 
n™' n = 0, which corresponds to the policy function f m ,2 m -i- In this case, we 
have, P m ,2™-i = 3 and Q m ,2™-i = -7, Fig. Efc»)- 

A more detailed analysis of Fig. |3J shows that Nash bargain solutions 
and Nash strict solutions only exist when both players have deterministic 
policies. In the sense of ensemble averages, Nash solutions are not equilibrium 
solutions of a game. 
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6.2 The Hawk-Dove game 

The Hawk-Dove game has been introduced by Maynard-Smith and Price [17] 
as a game theoretical basic model to describe animal conflicts. They have 
assumed two pure strategies: Hawk ('0') and Dove ('1'). A player chooses 
Hawk or '0' if he acts fiercely, and chooses Dove or '1' if he looks fierce 
and then retires. In the context of evolutionary biology, this game aims to 
explain the struggle for a territory whose payoff is related with the number 
of offsprings. The payoff matrix of the Hawk-Dove game is, 

|(r - c) r \ 
l+e ) 

where r represents the reproductive value and c is the cost of injury. In this 
game the Hawk strategy is dominant, provided c < r and e < r/2. If e > 
and e < r/2, the Hawk-Dove game is also dilemmatic. Globally, the species 
has advantage if everybody acts Dove, which is the non-dominant strategy. 

If all players choose their strategies with equal probability, by Theorem 
13.11 the mean payoff per player and move is, 

P = Q = \{r-\c)+e 

If c < 4r + 8s, P = Q > 0, the Hawk-Dove game shows advantage for the 
species. If c > 4r + 8e, the cost of injury is too high and globally the mean 
payoff per player and move is non-positive. 

If the representative players of the game choose a generalized tit-for-tat 
policy with memory length m — 1, and c < r, both players have a positive 
mean payoff per move. If the players choose not being the first to play Hawk, 
they both obtain the highest mean payoffs per move. 

In the Hawk-Dove non-cooperative and symmetric game, the tit-for-tat 
policy or imitation of the adversary move implies a positive payoff for both 
players, provided the cost of injury is not too high (c < 4r + 8e). 

7 Conclusions 

The dynamics of an iterated game is described by a one-to-many map defined 
on a state space, [14]. Within this framework, the concept of mixed strategy 
leads to the definition of the equilibrium solution of a game. This equilibrium 
solution is obtained as the limit of the iterates of a one-to-many map. In 
general, for a specific game, the equilibrium solutions associated to the set of 
all mixed strategies span the state space of the game. The concepts of strict 
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Nash equilibrium and bargain solutions of a game are discussed within this 
framework. 

In applications of game theory to economics, ethics, sociology, biology, 
physics, etc., it is sometimes easy to identify rules of behaviour and inter- 
actions between agents and to make guesses about payoffs. However, it is 
difficult to argue about the (infinite) memory of all the past choices of the 
players, and to insure that opponent players remain the same during all the 
iterated game. Therefore, the way of evaluating a game, or a policy depends 
on the context in which the game is considered. 

In order to evaluate a game, we have introduced the concept of represen- 
tative ensemble of a game. This technique has been applied to the global 
evaluation of a game, without any specific considerations about policies. In 
this evaluation, all the players make their choices of pure strategies with 
equal probabilities. In this case, we have shown that the mean payoffs per 
move of the players are the mean value of the entries of the payoff matrices 
of the game. 

To evaluate a deterministic policy with a finite memory length, we have 
calculated the mean payoffs per move of the players, for the case where one 
of the players has a deterministic policy and the other player chooses its 
pure strategies with equal probabilities. In this case, there exists a class 
of deterministic policies that forces equality of the mean payoffs per move 
of the players. This class of policies is the class of generalized tit-for-tat 
policies. When a representative player has a generalized tit-for-tat policy, in 
the limit of the iterated game, the payoffs of both representative players are 
equal. If a player tries to increase its payoff by changing its strategy and 
the other player plays tit-for-tat, the change in the strategy can increase or 
decrease the payoffs, but the payoffs per move of both players remain equal. 
Generalized tit-for-tat or imitation strategies force equalitarian payoffs per 
move. In dilemmatic games, the generalized tit-for-tat policy together with 
the condition of not being the first to defect, leads to the highest possible 
mean payoffs per move for the players. 
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